Query Term Expansion by Automatic Learning of Morphological Equivalence Patterns from Wikipedia
نویسندگان
چکیده
Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملLearning to expand queries using entities
A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. Recently, methods that exploit named entities have been shown to be more effective for query expansion than traditional pseudo-relevance feedback methods. In this paper, we introduce a supervised learning approach that exploits named entities for query expansion, using Wik...
متن کاملImproving Query Expansion for Information Retrieval Using Wikipedia
Query expansion (QE) is one of the key technologies to improve retrieval efficiency. Many studies on query expansion with relationships from single local corpus suffer from two problems resulting in low retrieval performance: term relationships are limited and unlisted query terms have no expansion terms. To address these problems, relationships between terms captured from Wikipedia are superim...
متن کاملSelect, Link and Rank: Diversified Query Expansion and Entity Ranking Using Wikipedia
A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that th...
متن کاملUTD at TREC 2014: Query Expansion for Clinical Decision Support
This paper describes the medical information retrieval (MIR) systems designed by the University of Texas at Dallas (UTD) for clinical decision support (CDS) which were submitted to the TREC 2014. We investigated the impact of various knowledge bases for automatic query expansion in the four officially submitted runs. Each of these systems exploits both Wikipedia and PubMed corpus statistics in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014